Goto

Collaborating Authors

 comprehensive response


comprehensive response to all comments given. 2 R1.1 However, I worry about the reproducibility since most of the results are run by only once

Neural Information Processing Systems

Thank you very much for the thorough and generally positive feedback. R1.1 However, I worry about the reproducibility since most of the results are run by only once. F or the equation between line 135 and 136( why does it not have a equation number?): We will add an equation number. The experiments stops on L=20.


Analysis of Threat-Based Manipulation in Large Language Models: A Dual Perspective on Vulnerabilities and Performance Enhancement Opportunities

Samancioglu, Atil

arXiv.org Artificial Intelligence

Large Language Models (LLMs) demonstrate complex responses to threat-based manipulations, revealing both vulnerabilities and unexpected performance enhancement opportunities. This study presents a comprehensive analysis of 3,390 experimental responses from three major LLMs (Claude, GPT-4, Gemini) across 10 task domains under 6 threat conditions. We introduce a novel threat taxonomy and multi-metric evaluation framework to quantify both negative manipulation effects and positive performance improvements. Results reveal systematic vulnerabilities, with policy evaluation showing the highest metric significance rates under role-based threats, alongside substantial performance enhancements in numerous cases with effect sizes up to +1336%. Statistical analysis indicates systematic certainty manipulation (pFDR < 0.0001) and significant improvements in analytical depth and response quality. These findings have dual implications for AI safety and practical prompt engineering in high-stakes applications.